Skip to content

Add binned summary statistic aggregation for genomic intervals — Closes #61#62

Open
conradbzura wants to merge 45 commits intomainfrom
61-binned-coverage-operator
Open

Add binned summary statistic aggregation for genomic intervals — Closes #61#62
conradbzura wants to merge 45 commits intomainfrom
61-binned-coverage-operator

Conversation

@conradbzura
Copy link
Copy Markdown
Collaborator

@conradbzura conradbzura commented Mar 10, 2026

Summary

Add the COVERAGE operator for computing binned genome coverage from interval data. The operator tiles the genome into fixed-width bins using generate_series, LEFT-JOINs against the source table, and aggregates overlapping intervals per bin. It supports configurable statistics (count, mean, sum, min, max), an optional target column for aggregating a named field instead of interval length, and both := and => named parameter syntax. The generated SQL runs against both DuckDB and PostgreSQL.

In addition to the operator itself, this branch lands a unit-test suite mirroring src/giql/ under tests/unit/, a bedtools-backed integration-correctness suite under tests/integration/bedtools/test_correctness_*.py, and the shared test-infrastructure helpers that support them. Every test follows the project's BDD naming, Given-When-Then docstrings, and Arrange-Act-Assert phase comments.

Closes #61

Proposed changes

Expression node (GIQLCoverage)

Register GIQLCoverage as a SQLGlot Func. The from_arg_list classmethod delegates to the shared _split_named_and_positional helper alongside the other Func subclasses, separating named parameters via PropertyEQ (:=) and Kwarg (=>) from positional arguments. Parameters: this (interval column), resolution (bin width, required), stat (aggregation function), and target (column to aggregate).

Parser registration

Register COVERAGE as a function in GIQLDialect so parse_one(..., dialect=GIQLDialect) produces a GIQLCoverage AST node. Exclude = from named-parameter syntax to avoid ambiguity with SQL equality.

Transformer (CoverageTransformer)

Rewrite queries containing COVERAGE(...) into a CTE-based plan:

  1. A __giql_chroms subquery computes per-chromosome MAX(end) from the source table. When the user-query FROM clause carries an alias, the subquery carries the same alias so alias-qualified WHERE predicates resolve.
  2. A __giql_bins CTE uses CROSS JOIN LATERAL generate_series(0, max_end - 1, resolution) to tile each chromosome — the - 1 clamp avoids a spurious trailing empty bin when MAX(end) lands on a bin boundary.
  3. A LEFT JOIN matches source intervals to bins, with any original WHERE conditions moved into the ON clause to preserve zero-coverage bins.
  4. GROUP BY and the selected aggregate — a typed SQLGlot node (exp.Count, exp.Avg, exp.Sum, exp.Min, exp.Max) — produce one row per bin.
  5. When no explicit AS alias is provided, the aggregate defaults to value.

User CTEs declared before the COVERAGE query are preserved alongside the internal __giql_bins CTE, so the operator composes with pre-filtering via the CTE-wrap pattern. Non-table FROM clauses (inline subqueries, VALUES) raise a typed ValueError at transpile time with a message pointing users at the CTE workaround. stat, target, and resolution must be string or integer literals — non-literal arguments raise typed errors rather than silently stringifying AST nodes. Negative or zero resolutions are rejected at transpile time. The default COUNT aggregate counts a non-null source column (not source.*), which gives value=0 for zero-coverage bins on both DuckDB and PostgreSQL.

CoverageTransformer reuses ClusterTransformer._get_table_name/_get_genomic_columns via delegation rather than duplicating them.

Documentation

Add COVERAGE to the aggregation-operators reference (docs/dialect/aggregation-operators.rst) with syntax, parameter descriptions (including the required flag on resolution), a sample output table, a LATERAL + generate_series compatibility note (DuckDB/PostgreSQL only — SQLite unsupported), and a "Supported FROM clauses" section explaining the CTE workaround for inline subqueries. List COVERAGE in the dialect operators index (docs/dialect/index.rst). Add a recipes page (docs/recipes/coverage.rst) with biological-context framing and patterns for basic coverage, named-resolution syntax, coverage statistics, filtered coverage, strand-specific coverage, target-column aggregation, 5' end counting (with correctly-quoted reserved-keyword identifiers), and RPM normalisation.

Unit tests

Add tests/unit/ mirroring src/giql/ with module-level test files for dialect, expressions, generators/base, table, transformer, and transpile. TestGIQLCoverage and TestCoverageTransformer cover parsing and transpilation end-to-end, including property-based tests with explicit @settings(max_examples=50) control, DuckDB execution tests that assert per-bin values across the full tiling, and regression tests for every correctness concern: LEFT-JOIN zero-coverage semantics, alias-qualified WHERE, user-CTE preservation, endpoint-clamped generate_series, non-positive-resolution rejection, and non-literal-stat/target rejection.

Integration tests

Add four correctness files under tests/integration/bedtools/: test_correctness_intersect.py, test_correctness_merge.py, test_correctness_nearest.py, and test_correctness_workflows.py. They exercise real bedtools subprocesses via pybedtools and compare against GIQL-transpiled SQL executed on real DuckDB. Tests are tagged with an integration pytest marker (registered in pyproject.toml) declared at module level in every integration file. A new generate_random_intervals helper produces deterministic multi-chromosome samples for scale tests.

Test infrastructure

Move the bedtools helper tests (test_bedtools_wrapper.py, test_comparison.py, test_duckdb_loader.py, test_data_models.py) to live alongside the helpers in tests/integration/bedtools/utils/, where they mirror their source modules. load_intervals accepts empty interval lists and creates an empty table rather than propagating duckdb.InvalidInputException.

Test cases

# Test Module Covers
1 tests/unit/test_expressions.py::TestGIQLCoverage COVERAGE parsing: positional args, named args via := and => syntax, all three parameter names (stat, target, resolution) reachable by either named form, plus property-based tests across 1-10M resolutions and every valid stat value
2 tests/unit/test_transformer.py::TestCoverageTransformer COVERAGE transpilation: CTE structure, stat→SQL aggregate mapping, target-column aggregation, default value alias, explicit alias preservation, WHERE→ON migration, column qualification, alias propagation into chroms subquery, custom column mappings, table aliases, CTE recursion, multi-COVERAGE rejection, invalid-stat rejection, non-positive-resolution rejection, subquery-FROM rejection, non-literal-stat/target rejection, user-CTE preservation, endpoint-clamp off-by-one, and end-to-end DuckDB execution across all five stats plus zero-coverage and WHERE-preserves-zero-bins scenarios
3 tests/unit/test_dialect.py Dialect-level parsing: GIQLDialect produces GIQLCoverage nodes from COVERAGE calls; other operators (CLUSTER, MERGE, DISTANCE, NEAREST) still parse correctly
4 tests/unit/test_generators_base.py BaseGIQLGenerator produces correct SQL for all operators, including alias resolution and custom column mappings
5 tests/unit/test_table.py, tests/unit/test_transpile.py Table/Tables registry behavior and end-to-end transpile() entry point
6 tests/integration/bedtools/test_correctness_intersect.py INTERSECT parity with bedtools intersect across overlap configurations, containment, deduplication, non-standard chroms, large intervals, scale, and same/opposite-strand filters
7 tests/integration/bedtools/test_correctness_merge.py MERGE parity with bedtools merge across topology configurations, overlap boundaries, unsorted input, per-chromosome segregation, strand-specific merge, and large-scale datasets
8 tests/integration/bedtools/test_correctness_nearest.py NEAREST parity with bedtools closest across overlap, upstream, downstream, multi-query, k>1, strand-aware, cross-chromosome, and scale scenarios — with distance assertions pinned to the bedtools 2.31 half-open gap convention
9 tests/integration/bedtools/test_correctness_workflows.py Chained pipelines: intersect→merge, nearest-then-filter-distance, merge→intersect, stranded intersect→merge, intersect-filter-merge, and the full intersect→merge→nearest pipeline with each GIQL stage feeding the next
10 tests/integration/bedtools/utils/test_* Test infrastructure: bedtools wrapper, result comparison, DuckDB loader (including the empty-input success case), and GenomicInterval/ComparisonResult data models

Total: 533 tests (357 unit + 176 integration). Every test follows BDD naming (test_<method>_should_<outcome>[_when_<condition>]), Given-When-Then docstrings on test functions and methods, and Arrange-Act-Assert phase comments.

Comment thread docs/dialect/aggregation-operators.rst Outdated
Comment thread docs/dialect/aggregation-operators.rst
Comment thread docs/dialect/aggregation-operators.rst
Comment thread tests/test_coverage.py Outdated
Comment thread tests/test_coverage.py Outdated
Comment thread tests/test_coverage.py Outdated
Comment thread tests/test_coverage.py Outdated
@conradbzura
Copy link
Copy Markdown
Collaborator Author

test_coverage.py has a number of tests that can be consolidated into a few property-based tests. We also need functional tests confirming query behavior — can use DuckDB for this.

@conradbzura conradbzura self-assigned this Mar 12, 2026
- ``'min'`` — minimum interval length of overlapping intervals
- ``'max'`` — maximum interval length of overlapping intervals

When ``target`` is specified, the stat is applied to that column instead of interval length.
Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@nvictus is interval length a sane default for target metric?

Comment thread docs/dialect/aggregation-operators.rst
Comment thread docs/recipes/coverage.rst
Define a new GIQLCoverage(exp.Func) AST node with this, resolution,
and stat arg_types. The from_arg_list classmethod handles both
positional and named parameters (EQ and PropertyEQ for := syntax).
Register COVERAGE in GIQLDialect.Parser.FUNCTIONS so the parser
recognises it.
CoverageTransformer rewrites SELECT COVERAGE(interval, N) queries
into a CTE-based plan: a __giql_bins CTE built from generate_series
via LATERAL, LEFT JOINed to the source table on overlap, with
GROUP BY and the appropriate aggregate (COUNT, AVG, SUM, MIN, MAX).
Wire the transformer into the transpile() pipeline before MERGE and
CLUSTER.
TestCoverageParsing (3 tests) verifies positional args, named stat
via :=, and named resolution. TestCoverageTranspile (11 tests) covers
basic transpilation, stat variants (mean/sum/max), custom column
mappings, WHERE preservation, additional SELECT columns, table alias
handling, resolution in generate_series, overlap join conditions, and
ORDER BY output.
Add a COVERAGE section to aggregation-operators.rst with description,
syntax, parameters, return value, examples, and related operators.
Create docs/recipes/coverage.rst with strand-specific coverage,
coverage statistics, filtered coverage, 5-prime end counting, and
RPM normalisation recipes. Add coverage to the recipe index.
Add exp.Kwarg handling alongside exp.PropertyEQ in from_arg_list so
that COVERAGE(interval, 1000, stat => 'mean') works identically to
the := form. Update the reference docs to show both syntaxes and add
a parsing test for the => form.
The = operator inside a function call is an equality comparison in
standard SQL, not parameter assignment. Only := (PropertyEQ) and
=> (Kwarg) are valid named parameter syntaxes. This makes COVERAGE
consistent with SQL semantics and allows = to be used as a boolean
expression argument.
Remove unused generate_series_sql variable and unwrap the redundant
exp.Subquery wrapper inside exp.Lateral. The old form emitted
CROSS JOIN LATERAL (GENERATE_SERIES(...)) which DuckDB rejects due
to the extra parentheses. The new form emits
CROSS JOIN LATERAL GENERATE_SERIES(...) which works on both DuckDB
and PostgreSQL.
Add optional target parameter to GIQLCoverage that specifies which
column to aggregate instead of defaulting to interval length
(end - start). When target is set, COUNT uses COUNT(target_col)
instead of COUNT(*), and other stats (mean, sum, min, max) aggregate
the named column.

Bare COVERAGE expressions without an explicit AS alias now default
to AS value.
The original query's WHERE was applied to the outer query, which
filtered out zero-coverage bins because source columns are NULL
for non-matching LEFT JOIN rows (NULL > threshold evaluates to
FALSE). Moving the WHERE into the JOIN's ON clause preserves all
bins while still filtering which source rows participate.

Also qualify unqualified column references with the source table
in both the JOIN ON condition and the chroms subquery WHERE to
avoid ambiguous column errors.
Replace the ad-hoc test classes with two spec-aligned classes:

- TestGIQLCoverage (10 tests): example-based parsing for positional
  args, :=/=> named params, target parameter, and all-named-params;
  property-based tests for stat+resolution combos, positional-only,
  and target syntax variants.

- TestCoverageTransformer (26 tests): instantiation, basic
  transpilation, all five stats, target with count/non-count,
  default and explicit aliases, WHERE-to-ON migration with column
  qualification, custom column mapping, table alias, resolution
  propagation, CTE nesting, error paths (invalid stat, multiple
  COVERAGE), and five DuckDB end-to-end functional tests.

Update docs to document the target parameter, default value alias,
and add a recipe for aggregating a specific column.
Cover bedtools_wrapper, comparison, data_models, and duckdb_loader
utility modules used by the integration test suite.
Cover dialect parser, expression nodes, BaseGIQLGenerator, table
metadata, ClusterTransformer, MergeTransformer, CoverageTransformer,
and the public transpile() API.
Compare GIQL INTERSECTS, MERGE, and NEAREST output against bedtools
results across edge cases, strand handling, scale, and multi-step
workflow pipelines.
@conradbzura conradbzura force-pushed the 61-binned-coverage-operator branch from 039baae to 1fba22a Compare April 9, 2026 15:30
COUNT(source.*) in the no-target branch counted LEFT-JOIN-unmatched
rows as 1 on DuckDB and 0 on PostgreSQL. DuckDB includes all-NULL
composite rows in the count, so every bin with no overlapping interval
returned value=1 instead of 0 — violating the portability and
canonicality principles and contradicting the PR's own zero-coverage
test case.

Counting the source chrom column instead collapses unmatched rows to
NULL on both backends, so empty bins correctly return 0.

Add a regression test that forces genuinely-empty middle bins (two
intervals 2kb apart with resolution=500), asserting value=0 on the
four intermediate bins. This test failed on DuckDB before the fix.
When a COVERAGE query's FROM clause uses a table alias and the WHERE
qualifies columns by that alias (FROM features f WHERE f.score > 10),
the alias-qualified predicate was copied verbatim into the
__giql_chroms subquery but the subquery's own FROM used only the
bare table name. DuckDB rejected the resulting SQL with a binder
error because the alias was unresolvable inside the subquery.

Forward the alias into the chroms subquery's FROM so alias-qualified
columns resolve. Add a regression test that runs COVERAGE over
FROM features f WHERE f.score > 10 against DuckDB and asserts the
filtered result contains the expected bins.
The final query built by CoverageTransformer.set("with_", ...)
unconditionally replaced query.args["with_"], dropping any CTE the
user declared before SELECT COVERAGE(...). WITH selected AS (...)
SELECT COVERAGE(interval, 1000) FROM selected produced a query that
referenced "selected" after the CTE had been thrown away, failing at
bind time with "Table does not exist".

Mirror the MergeTransformer fix from 185b716 by merging existing
user CTEs with the newly-built __giql_bins CTE. Add a regression
test that wraps COVERAGE over a user-defined CTE and asserts the
filtered bin count.
Negative literals parsed as Neg(Literal(N)) and the existing fallback
int(str(resolution_expr.this)) silently stripped the sign, so
COVERAGE(interval, -1) emitted generate_series(0, __max_end, 1) — a
valid but wildly different query than what the user wrote. A zero
resolution was also accepted and produced a query that failed at
execution with a backend-specific "step cannot be zero" error
instead of a clear GIQL error.

Replace the lossy str-conversion fallback with explicit Neg-of-Literal
handling that preserves the sign, then validate resolution > 0 and
raise a typed ValueError otherwise. Add regression tests for negative
and zero resolutions.
GIQLCoverage.from_arg_list reinlined the named/positional split that
the four sibling Func subclasses already delegate to the module-level
helper. Replace the inline loop with a helper call so the parsing
behaviour stays in one place.
CoverageTransformer had byte-for-byte copies of _get_table_name and
a strand-less variant of _get_genomic_columns already on
ClusterTransformer. MergeTransformer handles this by holding a
ClusterTransformer and delegating; mirror that pattern here so
changes to the column-lookup logic only need to happen in one place.

_get_table_alias has no counterpart elsewhere and stays on
CoverageTransformer.
…rmer

Per the style guide's class-member ordering rule, public methods must
precede private helpers. _get_table_alias was sitting between __init__
and transform; move it below transform alongside the other private
methods so readers see the public entry point first.
generate_series is endpoint-inclusive on both DuckDB and PostgreSQL,
so calling it with (0, MAX(end), resolution) yields an extra series
element whenever MAX(end) lands exactly on a bin boundary. The LEFT
JOIN then emits a spurious bin beyond any interval in the data —
e.g., an interval ending at position 1000 with resolution=1000
produced bins [0,1000) and [1000,2000) instead of just the first.

Subtract one from MAX(end) inside generate_series so the series stops
at the last strictly-occupied byte. Add a regression test that places
an interval ending exactly on a bin boundary and asserts only the
occupied bin is returned.
When the FROM clause is an inline subquery or VALUES rather than a
named table reference, _get_table_name returned None and the
transformer silently emitted SQL that (a) built the chroms subquery
with no FROM clause at all and (b) referenced a literal nonexistent
table named "source" in the LEFT JOIN. Users saw cryptic runtime
errors with no path back to their GIQL query.

Raise ValueError at transpile time with a clear message pointing at
the FROM clause. Keep CTE-by-name FROMs working — those parse as
exp.Table and yield the CTE name. Add a regression test using a
subquery FROM.
The stringify-fallback for non-Literal stat and target arguments
silently accepted AST nodes, stringified them, and tried to use the
result as a stat keyword or column name. For stat this produced a
confusing "Unknown COVERAGE stat '<stringified-node>'" message when
users wrote stat := some_col without quotes. For target it silently
succeeded because COUNT(source.score) happens to work when score is
a real column, masking user intent drift.

Replace both fallbacks with explicit Literal checks that raise a
typed ValueError. Add regression tests for both parameters.
The COVERAGE operator reference page did not explain what FROM shapes
are accepted, and the ValueError raised when an unsupported FROM is
encountered did not point users at the CTE-wrap workaround. Add a
"Supported FROM clauses" section with a before/after example and
extend the error message to tell users to wrap the derivation in a
WITH clause.

Also note that user-defined CTEs are preserved alongside the internal
__giql_bins CTE, so composing COVERAGE over a pre-filtering CTE is
the canonical way to work around the inline-subquery limitation.
The COVERAGE operator reference page existed but was not linked from
the dialect landing page's Aggregation Operators table, making the
new operator undiscoverable from the docs TOC. The MERGE page's
Related Operators section already cross-references COVERAGE, so the
index omission was also internally inconsistent.
"end" is a reserved keyword in ANSI SQL (used in CASE ... END) and
"start" is reserved in some dialects. DuckDB tolerates them unquoted
in many contexts but PostgreSQL in strict mode and other engines
will reject the unquoted identifiers. The rest of the docs and the
COVERAGE transpiler emit these as double-quoted identifiers, so the
recipe should match to stay portable across the GIQL-supported
backends.
The test was named _distance_zero with a docstring claiming "adjacent
intervals report distance=0", but the assertion accepted <= 1 and an
inline comment admitted bedtools 2.31+ reports 1. The name, docstring,
and assertion disagreed — the test body didn't actually verify the
behavior the name claimed, breaking the test guide's core principle.

Rename to describe the real behavior (finding the correct adjacent
neighbor — a parity test, not a distance-value claim), rewrite the
docstring in the required Given/When/Then format, and pin the
bedtools distance assertion to the canonical value of 1 for the
pinned bedtools >= 2.31.0 dependency. Add AAA phase comments while
rewriting the body.
The previous test promised "full pipeline (intersect -> merge ->
nearest) is run then each intermediate step matches bedtools" but
never ran the nearest step (intervals_c was loaded then unused) and
bypassed GIQL mid-pipeline by reconstructing step 1 with a
hand-written raw-SQL overlap predicate instead of feeding the
verified GIQL step-1 output into step 2.

Rewrite so that each stage's GIQL output is materialized as a table
and fed into the next GIQL stage — step 1 → step1_results table →
step 2 → step2_results table → step 3. Switch the inputs from
random generation to hand-crafted deterministic intervals so the
nearest step's correctness is unambiguous. Assert row equality for
intersect and merge; assert (a_name, b_name) pair equality for
nearest because bedtools 2.31+ uses the N+1 half-open gap distance
convention while GIQL uses N (distance-value parity is already
covered by the dedicated nearest tests).

Apply the BDD naming pattern, GWT docstring, and AAA phase comments
while rewriting the test body.
pytest only reads module-level pytestmark from test modules, not from
conftest.py. The conftest declared pytestmark = pytest.mark.integration
but none of the test files did, so -m integration and -m "not
integration" selection was a no-op and strict-markers mode would
error on the unregistered marker.

Register the integration marker in pyproject.toml and add the module
pytestmark to every test_*.py file under tests/integration/bedtools/.
Remove the dead pytestmark line from conftest.py to avoid implying
it propagates. With this change pytest -m integration now selects
all 103 integration tests and -m "not integration" correctly
deselects them.
…on correctness tests

The four tests/integration/bedtools/test_correctness_*.py files used
ALL-CAPS GIVEN/WHEN/THEN docstrings without a leading summary, no
Arrange/Act/Assert phase comments, and test names that described
scenarios without a should_<outcome> clause. All three violate the
Python Test Guide's core principles (naming §3, GWT docstrings §4,
AAA phase comments §5), and the project has resolved the naming
conflict with the older constitution example in favour of the Test
Guide's BDD pattern.

Rename every test to test_<method>_should_<outcome>[_when_<condition>]
form, rewrite every docstring with a leading "Test <summary>." line,
blank line, and indented Given/When/Then blocks, and add
Arrange/Act/Assert phase comments to every test body. Also fix the
stale "grep chr1" reference in the intersect-filter-merge workflow
docstring to describe the actual Python-side filter.

Helper functions (_run_*_comparison, _load_and_query_nearest) keep
their plain docstrings since the guide restricts GWT to test
functions and methods. All 35 tests still pass.
Three files under tests/unit/ tested helpers that live in
tests/integration/bedtools/utils/. Per the test guide, tests must
mirror the directory structure of the modules under test, and tests
for test infrastructure should sit alongside the infrastructure —
not in the project-level unit test package. The same three files
also wrapped module-level function tests in Test<Function> classes,
violating the guide's rule that module-level functions get
module-level tests.

Move the files:
  tests/unit/test_bedtools_wrapper.py -> tests/integration/bedtools/utils/
  tests/unit/test_comparison.py       -> tests/integration/bedtools/utils/
  tests/unit/test_duckdb_loader.py    -> tests/integration/bedtools/utils/

Convert TestCreateBedtool, TestIntersect, TestMerge, TestClosest,
TestBedtoolToTuples, TestCompareResults, and TestLoadIntervals from
class wrappers to module-level functions. TestBedtoolsError stays a
class because it wraps a real exception class. Update imports to
relative form and add pytestmark = pytest.mark.integration to each
moved module (these exercise real bedtools subprocess + DuckDB).
… duplicate

tests/test_coverage.py duplicated TestGIQLCoverage (also present in
tests/unit/test_expressions.py) and TestCoverageTransformer (also
present in tests/unit/test_transformer.py) with stronger content in
the root file — more tests, proper AAA phase comments, proper
Given-When-Then docstrings, property tests, and end-to-end DuckDB
coverage. The root file also violated the test guide's mirror-the-
source-module convention because there is no src/giql/coverage.py
module.

Replace the weaker TestGIQLCoverage in tests/unit/test_expressions.py
with the 10-test version from the root file (up from 7). Replace the
weaker TestCoverageTransformer in tests/unit/test_transformer.py
with the 37-test version (up from 20). Carry over the Hypothesis,
duckdb, Table, and transpile imports needed by the merged content,
plus the VALID_STATS module constant. The to_df fixture already
lives in tests/conftest.py so it is auto-discovered by tests/unit/
without a new conftest.

Delete tests/test_coverage.py.

Net: +47 coverage tests in unit/ - 27 weaker duplicates - 1222 lines
of the root file = 89 tests in the two touched unit files after the
merge (up from ~57).
tests/unit/test_data_models.py tests classes from
tests/integration/bedtools/utils/data_models.py. Same mirror-the-
source-module concern that motivated the prior helper-test
relocations — this one was missed because the original review
finding did not mention it explicitly.

Move to tests/integration/bedtools/utils/test_data_models.py,
switch to relative imports, add the integration marker. All 24
tests pass from the new location.
…tests

The test files under tests/unit/ (and the helper-tests moved into
tests/integration/bedtools/utils/) used a mix of spec-ID-prefixed
names (BG-001, CSM-001, CT-007, DC-001, GR-001, GD-001, MT-001,
SP-001, SSP-001, DI-001, NR-001, CL-001), scenario-only names
(test_contains, test_iter), and ALL-CAPS GIVEN/WHEN/THEN docstrings
without a leading summary. None of them had Arrange/Act/Assert
phase comments. All three violate Python Test Guide §3/§4/§5.

Sweep all ten files:
- Rename every test to test_<method>_should_<outcome>[_when_<condition>]
  form where <method> matches the __name__ of the method or function
  under test exactly; dunder methods use the dunder form
  (test___init___should_..., test___contains___should_..., etc.).
  Strip spec-ID prefixes from test names.
- Replace every docstring with the "Test <summary>." + blank line +
  indented Given:/When:/Then: block form required by §4.
- Add # Arrange / # Act / # Assert phase comments (or combined forms
  where phases are inseparable) to every test body.

Classes with already-compliant docstrings and AAA comments from the
earlier tests/test_coverage.py merge (TestGIQLCoverage in
test_expressions.py and TestCoverageTransformer in test_transformer.py)
had only their names updated per B4.10/B4.11/B4.12 — the strong
body style was preserved verbatim.

Helper functions (_make_tables, _transform_and_sql, _normalize, etc.)
and fixtures keep their plain docstrings per the guide's rule that
GWT applies only to test functions and methods. Imports, fixtures,
assertions, test data, and control flow are unchanged across all
touched files. All 533 tests pass.
Test Guide §7 requires property tests to use @settings to control
max_examples. Two property tests (in tests/integration/bedtools/utils/
test_comparison.py and test_data_models.py) had @given without any
@settings, and five @settings calls in tests/unit/test_expressions.py
and tests/unit/test_transformer.py only set suppress_health_check
without pinning max_examples — every one of them rode on Hypothesis
defaults.

Add @settings(max_examples=50) to the two bare property tests
(importing settings where needed) and add max_examples=50 to the
five existing @settings calls so all seven property tests run a
deliberately-chosen sample size.
Tests calling the helper assert on SQL substrings, but the original
name "_transform_and_sql" obscured that the helper exercises three
stages — parse, transform, and generate — so any failure in the
generator surfaced as a transformer-test failure with no signal
about which stage was at fault.

Rename to _transpile_with_transformer and add a docstring that
states the helper runs the full pipeline and that test failures
should be triaged across all three stages, not assumed to be
transformer bugs. Update all 14 call sites.
Bundle of five minor cleanups flagged in the review:

- dialect.py: alphabetize the giql.expressions imports so GIQLCoverage
  slots between GIQLCluster and GIQLDistance (ruff/isort order).
- transformer.py: annotate COVERAGE_STAT_MAP as Final[dict[str, str]]
  to signal binding immutability.
- transformer.py: switch the CoverageTransformer class-docstring
  summary to the imperative mood required by the style guide.
- transformer.py: replace :return: with :returns: across the module
  (12 occurrences) to match the constitution's mandated reST tag.
- transformer.py: drop the redundant "AS <table>" alias emitted on
  the LEFT JOIN when no user-supplied alias is present — produces
  LEFT JOIN features rather than LEFT JOIN features AS features.
  Also removes the dead "source" literal fallback since an earlier
  fix raises on non-table FROMs before reaching this code.
The transformer was building COUNT/AVG/SUM/MIN/MAX as
exp.Anonymous(this=<name>, expressions=[...]) nodes. That form
bypasses SQLGlot's dialect-specific rendering — DuckDB and
PostgreSQL happen to render these five aggregates identically, but
any future backend (or dialect-specific NULL/DISTINCT handling) has
to re-learn each aggregate from the string name. Portable-principle
smell.

Swap to exp.Count/Avg/Sum/Min/Max typed nodes, looked up via a
private _AGG_NODE mapping keyed by the SQL name from
COVERAGE_STAT_MAP. Also collapse the four-branch if/else that
built the aggregate argument into a single-assignment block so
the aggregate type and its inner expression are decided in one
place.

COVERAGE_STAT_MAP remains public (it's imported by tests and acts
as the stat-name contract); the typed-node lookup is private.
Six minor clarifications flagged in the review:

- aggregation-operators.rst: replace the ambiguous
  "COVERAGE(interval, resolution)" example — where "resolution" read
  as either a placeholder or a named-parameter keyword — with the
  clearer "<bin_width>" placeholder.
- aggregation-operators.rst: mark resolution (required) and note
  that a non-positive value raises ValueError at transpile time.
- aggregation-operators.rst: add a compatibility note explaining
  COVERAGE relies on LATERAL + generate_series (DuckDB and
  PostgreSQL) and is not currently supported on SQLite.
- coverage.rst: add a biological-context paragraph at the top of
  the "Basic Coverage" section so the recipes open with a motivation
  matching the framing used in the sibling MERGE/CLUSTER recipes.
- coverage.rst: add a "Named Resolution Parameter" recipe
  demonstrating "resolution := 500" for symmetry with the other
  named-parameter examples — the reference page shows it but the
  recipes never did.
- coverage.rst: trim the one-character overshoot on the
  "Coverage of High-Scoring Features" section underline.
…elper

Bundle of five related review findings on the integration correctness
suite:

- Extract the duplicated random-interval generation loop from the
  three scale tests into a new helper
  tests/integration/bedtools/utils/random_intervals.py so future
  scale tests share one deterministic source (NB3.5).

- Replace disjunctive "in (300, 301)" distance tolerances in the
  nearest upstream/downstream tests with exact value 301, matching
  the pinned bedtools >= 2.31 half-open gap convention (NB3.1).

- Replace the sort-then-zip pair comparison in the multi-query
  nearest test with set-equality on (a_name, b_name) tuples, which
  is what the docstring actually claims (NB3.2).

- Introduce module-level column-index constants (A_NAME, B_CHROM,
  B_NAME, DISTANCE) in test_correctness_nearest.py so assertions
  read by name instead of magic positional indices like r[9] (NB3.3).

- Strengthen test_merge_should_preserve_strand_when_stranded_true to
  route through _run_merge_comparison(..., strand_mode="same") and
  assert coordinate-level equality instead of just row count (NB3.4).

- Strengthen the nearest-then-filter-distance workflow to assert
  set equality on (a_name, b_name) pairs between GIQL and the
  distance-filtered bedtools output, not just row counts (NB3.6).

- Drop the brittle substring-match strand_mode inference in
  _run_intersect_comparison; accept strand_mode explicitly and
  forward it to the bedtools wrapper (NB3.7).

All 533 tests still pass.
Four related review findings on unit-test quality:

- Pin the bedtools distance value in the closest-basic test to the
  exact 2.31+ output (101 for a 100-base half-open gap) instead of
  "in (100, 101)". The k>1 test similarly pinned to the exact count
  bedtools 2.31.1 returns for tied candidates under -t first (NB4.1).

- Make load_intervals accept an empty interval list. The helper
  previously called conn.executemany on a zero-row list and let
  DuckDB raise InvalidInputException, which the unit test merely
  documented. Create the table and skip the insert when empty;
  rewrite the test to assert the empty-table success outcome (NB4.2).

- Strengthen two end-to-end COVERAGE tests to assert the full
  bin-tiling (len, start set, and per-bin values) rather than
  single-bin spot checks. Fix the zero-coverage-gaps fixture so
  there is actually a zero-coverage middle bin to observe (NB4.3).

- Derive VALID_STATS in tests/unit/test_expressions.py and
  tests/unit/test_transformer.py from COVERAGE_STAT_MAP so the
  property-test sample domain stays in sync with the implementation
  — adding a new stat key in the source automatically extends the
  property tests (NB4.4).
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add binned summary statistic aggregation for genomic intervals

1 participant